COSINE: non-seeding method for mapping long noisy sequences

نویسندگان

  • Pegah Tootoonchi Afshar
  • Wing Hung Wong
چکیده

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Second order statistics spectrum estimation method for robust speech recognition

A second order statistics spectrum estimation (SOSSE) method for speech enhancement is presented. DFT amplitude spectral components of noisy signal are assumed to be random values. Upon first and second order statistic values estimation of noise-only spectrum, an enhancement of noisy signal spectrum was performed. As a reference, a fast discrete cosine transform based signal subspace (FDCTSS) m...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

Design and Simulation of a Modified 32-bit ROM-based Direct Digital Frequency Synthesizer on FPGA

This paper presents a modified 32-bit ROM-based Direct Digital Frequency Synthesizer (DDFS). Maximum output frequency of the DDFS is limited by the structure of the accumulator used in the DDFS architecture. The hierarchical pipeline accumulator (HPA) presented in this paper has less propagation delay time rather than the conventional structures. Therefore, it results in both higher maximum ope...

متن کامل

روشی جدید برای تفکیک و طبقه‌بندی توالی‌های سرطانی و غیرسرطانی DNA با استفاده از الگوریتم‌های مبتنی بر LPC و SVD

The growing pace of cancer has encouraged researchers to deliberate several aspects of this malignant disease. Genetic-induced nature of cancer, heighten the importance of studying intra-cell components. This paper has been carried out with the aim of making some specific and unique features clear from those long DNA sequences by employing well-established DNA sequence analysis techniques. The ...

متن کامل

P87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders

Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2017